Multimodal HMM-based NAM-to-speech conversion

نویسندگان

Viet-Anh Tran

Gérard Bailly

Hélène Loevenbruck

Tomoki Toda

چکیده

Although the segmental intelligibility of converted speech from silent speech using direct signal-to-signal mapping proposed by Toda et al. [1] is quite acceptable, listeners have sometimes difficulty in chunking the speech continuum into meaningful words due to incomplete phonetic cues provided by output signals. This paper studies another approach consisting in combining HMM-based statistical speech recognition and synthesis techniques, as well as training on aligned corpora, to convert silent speech to audible voice. By introducing phonological constraints, such systems are expected to improve the phonetic consistency of output signals. Facial movements are used in order to improve the performance of both recognition and synthesis procedures. The results show that including these movements improves the recognition rate by 6.2% and a final improvement of the spectral distortion by 2.7% is observed. The comparison between direct signal-to-signal and phonetic-based mappings is finally commented in this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker’s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems...

متن کامل

Improving body transmitted unvoiced speech with statistical voice conversion

The conversion method from Non-Audible Murmur (NAM) to ordinary speech based on the statistical voice conversion (NAM-toSpeech) has been proposed towards realization of “silent speech telephone.” Although NAM-to-Speech converts NAM to intelligible voices with similar quality to speech, there is still a large problem, i.e., difficulties of the F0 estimation from unvoiced speech. In order to avoi...

متن کامل

Towards Augmentative Speech Communication

Speech is the most natural form of communication for human beings and is often described as a unimodal communication channel. However, it is well known that speech is multimodal in nature and includes the auditive, visual, and tactile modalities. Other less natural modalities such as electromyographic signal, invisible articulator display, or brain electrical activity or electromagnetic activit...

متن کامل

Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion

Speech synthesis has been applied in many kinds of practical applications. Currently, state-of-the-art speech synthesis uses statistical methods based on hidden Markov model (HMM). Speech synthesized by statistical methods can be considered over-smooth caused by the averaging in statistical processing. In the literature, there have been many studies attempting to solve over-smoothness in speech...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Multimodal HMM-based NAM-to-speech conversion

نویسندگان

چکیده

منابع مشابه

Speech enhancement based on hidden Markov model using sparse code shrinkage

Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

Improving body transmitted unvoiced speech with statistical voice conversion

Towards Augmentative Speech Communication

Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion

عنوان ژورنال:

اشتراک گذاری